Melange: Components for Cross-Lingual Retrieval

نویسندگان

  • Max Pfingsthorn
  • Koen E. A. van de Sande
  • Vladimir Nedovic
چکیده

We present the finalized version of our cross-lingual search engine Melange, and results obtained by running it on WebCLEF topics in an attempt to solve Mixed Monolingual and Multilingual tasks. We concentrate on certain features of the system which are relevant to the CLIR field and which can be developed further independently. These are our data extraction and indexing methods, our language detection module (with an accuracy of 88% on WebCLEF query strings), PageRank ranking scheme and query translation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Ricoh at CLEF 2004

Abstract. This paper describes the participation of RICOH in the monolingual and cross-lingual information retrieval tasks on German Indexing and Retrieval Testdatabase (GIRT) in the Cross-Language Evaluation Forum (CLEF) 2004. We used a morphological analyzer for word decompounding and parallel corpora for cross-lingual information retrieval. The performance of cross-lingual information retrie...

متن کامل

Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval

Our approach to cross-lingual document retrieval starts from the assumption that effective monolingual retrieval is at the core of any cross-language retrieval system. We devote particular attention to three crucial ingredients of our approach to cross-lingual retrieval. First, effective tokenization techniques are essential to cope with morphological variations common in many European language...

متن کامل

Cross-Lingual Word Representations via Spectral Graph Embeddings

Cross-lingual word embeddings are used for cross-lingual information retrieval or domain adaptations. In this paper, we extend Eigenwords, spectral monolingual word embeddings based on canonical correlation analysis (CCA), to crosslingual settings with sentence-alignment. For incorporating cross-lingual information, CCA is replaced with its generalization based on the spectral graph embeddings....

متن کامل

Cross-Lingual Information Access and Its Evaluation

This paper introduces the outline of the first NTCIR Workshop [1], August 30 September 1, 1999, which is the first evaluation workshop designed to enhance research in Japanese text retrieval and cross-lingual information retrieval, then suggests some thoughts on the future directions of cross-lingual information access in the research and development of digital libraries.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005